41 research outputs found

    Deep Convolutional Ranking for Multilabel Image Annotation

    Full text link
    Multilabel image annotation is one of the most important challenges in computer vision with many real-world applications. While existing work usually use conventional visual features for multilabel annotation, features based on Deep Neural Networks have shown potential to significantly boost performance. In this work, we propose to leverage the advantage of such features and analyze key components that lead to better performances. Specifically, we show that a significant performance gain could be obtained by combining convolutional architectures with approximate top-kk ranking objectives, as thye naturally fit the multilabel tagging problem. Our experiments on the NUS-WIDE dataset outperforms the conventional visual features by about 10%, obtaining the best reported performance in the literature

    Large-scale image retrieval using similarity preserving binary codes

    Get PDF
    Image retrieval is a fundamental problem in computer vision, and has many applications. When the dataset size gets very large, retrieving images in Internet image collections becomes very challenging. The challenges come from storage, computation speed, and similarity representation. My thesis addresses learning compact similarity preserving binary codes, which represent each image by a short binary string, for fast retrieval in large image databases. I will first present an approach called Iterative Quantization to convert high-dimensional vectors to compact binary codes, which works by learning a rotation to minimize the quantization error of mapping data to the vertices of a binary Hamming cube. This approach achieves state-of-the-art accuracy for preserving neighbors in the original feature space, as well as state-of-the-art semantic precision. Second, I will extend this approach to two different scenarios in large-scale recognition and retrieval problems. The first extension is aimed at high-dimensional histogram data, such as bag-of-words features or text documents. Such vectors are typically sparse and nonnegative. I develop an algorithm that explores the special structure of such data by mapping feature vectors to binary vertices in the positive orthant, which gives improved performance. The second extension is for Fisher Vectors, which are dense descriptors having tens of thousands to millions of dimensions. I develop a novel method for converting such descriptors to compact similarity-preserving binary codes that exploits their natural matrix structure to reduce their dimensionality using compact bilinear projections instead of a single large projection matrix. This method achieves retrieval and classification accuracy comparable to that of the original descriptors and to the state-of-the-art Product Quantization approach while having orders of magnitude faster code generation time and smaller memory footprint. Finally, I present two applications of using Internet images and tags/labels to learn binary codes with label supervision, and show improved retrieval accuracy on several large Internet image datasets. First, I will present an application that performs cross-modal retrieval in the Hamming space. Then I will present an application on using supervised binary classeme representations for large-scale image retrieval.Doctor of Philosoph

    Multi-Label Image Classification via Knowledge Distillation from Weakly-Supervised Detection

    Full text link
    Multi-label image classification is a fundamental but challenging task towards general visual understanding. Existing methods found the region-level cues (e.g., features from RoIs) can facilitate multi-label classification. Nevertheless, such methods usually require laborious object-level annotations (i.e., object labels and bounding boxes) for effective learning of the object-level visual features. In this paper, we propose a novel and efficient deep framework to boost multi-label classification by distilling knowledge from weakly-supervised detection task without bounding box annotations. Specifically, given the image-level annotations, (1) we first develop a weakly-supervised detection (WSD) model, and then (2) construct an end-to-end multi-label image classification framework augmented by a knowledge distillation module that guides the classification model by the WSD model according to the class-level predictions for the whole image and the object-level visual features for object RoIs. The WSD model is the teacher model and the classification model is the student model. After this cross-task knowledge distillation, the performance of the classification model is significantly improved and the efficiency is maintained since the WSD model can be safely discarded in the test phase. Extensive experiments on two large-scale datasets (MS-COCO and NUS-WIDE) show that our framework achieves superior performances over the state-of-the-art methods on both performance and efficiency.Comment: accepted by ACM Multimedia 2018, 9 pages, 4 figures, 5 table

    Experimental exploration of five-qubit quantum error correcting code with superconducting qubits

    Full text link
    Quantum error correction is an essential ingredient for universal quantum computing. Despite tremendous experimental efforts in the study of quantum error correction, to date, there has been no demonstration in the realisation of universal quantum error correcting code, with the subsequent verification of all key features including the identification of an arbitrary physical error, the capability for transversal manipulation of the logical state, and state decoding. To address this challenge, we experimentally realise the [ ⁣[5,1,3] ⁣][\![5,1,3]\!] code, the so-called smallest perfect code that permits corrections of generic single-qubit errors. In the experiment, having optimised the encoding circuit, we employ an array of superconducting qubits to realise the [ ⁣[5,1,3] ⁣][\![5,1,3]\!] code for several typical logical states including the magic state, an indispensable resource for realising non-Clifford gates. The encoded states are prepared with an average fidelity of 57.1(3)%57.1(3)\% while with a high fidelity of 98.6(1)%98.6(1)\% in the code space. Then, the arbitrary single-qubit errors introduced manually are identified by measuring the stabilizers. We further implement logical Pauli operations with a fidelity of 97.2(2)%97.2(2)\% within the code space. Finally, we realise the decoding circuit and recover the input state with an overall fidelity of 74.5(6)%74.5(6)\%, in total with 9292 gates. Our work demonstrates each key aspect of the [ ⁣[5,1,3] ⁣][\![5,1,3]\!] code and verifies the viability of experimental realization of quantum error correcting codes with superconducting qubits.Comment: 6 pages, 4 figures + Supplementary Material

    RGB2LIDAR: Towards Solving Large-Scale Cross-Modal Visual Localization

    Full text link
    We study an important, yet largely unexplored problem of large-scale cross-modal visual localization by matching ground RGB images to a geo-referenced aerial LIDAR 3D point cloud (rendered as depth images). Prior works were demonstrated on small datasets and did not lend themselves to scaling up for large-scale applications. To enable large-scale evaluation, we introduce a new dataset containing over 550K pairs (covering 143 km^2 area) of RGB and aerial LIDAR depth images. We propose a novel joint embedding based method that effectively combines the appearance and semantic cues from both modalities to handle drastic cross-modal variations. Experiments on the proposed dataset show that our model achieves a strong result of a median rank of 5 in matching across a large test set of 50K location pairs collected from a 14km^2 area. This represents a significant advancement over prior works in performance and scale. We conclude with qualitative results to highlight the challenging nature of this task and the benefits of the proposed model. Our work provides a foundation for further research in cross-modal visual localization.Comment: ACM Multimedia 202

    Overall and cause-specific mortality rates among men and women with high exposure to indoor air pollution from the use of smoky and smokeless coal: a cohort study in Xuanwei, China

    Get PDF
    OBJECTIVES: Never-smoking women in Xuanwei (XW), China, have some of the highest lung cancer rates in the country. This has been attributed to the combustion of smoky coal used for indoor cooking and heating. The aim of this study was to evaluate the spectrum of cause-specific mortality in this unique population, including among those who use smokeless coal, considered 'cleaner' coal in XW, as this has not been well-characterised. DESIGN: Cohort study. SETTING: XW, a rural region of China where residents routinely burn coal for indoor cooking and heating. PARTICIPANTS: Age-adjusted, cause-specific mortality rates between 1976 and 2011 were calculated and compared among lifetime smoky and smokeless coal users in a cohort of 42 420 men and women from XW. Mortality rates for XW women were compared with those for a cohort of predominately never-smoking women in Shanghai. RESULTS: Mortality in smoky coal users was driven by cancer (41%), with lung cancer accounting for 88% of cancer deaths. In contrast, cardiovascular disease (CVD) accounted for 32% of deaths among smokeless coal users, with 7% of deaths from cancer. Total cancer mortality was four times higher among smoky coal users relative to smokeless coal users, particularly for lung cancer (standardised rate ratio (SRR)=17.6). Smokeless coal users had higher mortality rates of CVD (SRR=2.9) and pneumonia (SRR=2.5) compared with smoky coal users. These patterns were similar in men and women, even though XW women rarely smoked cigarettes. Women in XW, regardless of coal type used, had over a threefold higher rate of overall mortality, and most cause-specific outcomes were elevated compared with women in Shanghai. CONCLUSIONS: Cause-specific mortality burden differs in XW based on the lifetime use of different coal types. These observations provide evidence that eliminating all coal use for indoor cooking and heating is an important next step in improving public health particularly in developing countries

    Comparing data-dependent and dataindependent embeddings for classification and ranking of internet images

    No full text
    This paper presents a comparative evaluation of feature embeddings for classification and ranking in large-scale Internet image datasets. We follow a popular framework for scalable visual learning, in which the data is first transformed by a nonlinear embedding and then an efficient linear classifier is trained in the resulting space. Our study includes data-dependent embeddings inspired by the semisupervised learning literature, and data-independent ones based on approximating specific kernels (such as the Gaussian kernel for GIST features and the histogram intersection kernel for bags of words). Perhaps surprisingly, we find that data-dependent embeddings, despite being computed from large amounts of unlabeled data, do not have any advantage over data-independent ones in the regime of scarce labeled data. On the other hand, we find that several data-dependent embeddings are competitive with popular data-independent choices for large-scale classification. 1

    Iterative quantization: A procrustean approach to learning binary codes

    No full text
    This paper addresses the problem of learning similaritypreserving binary codes for efficient retrieval in large-scale image collections. We propose a simple and efficient alternating minimization scheme for finding a rotation of zerocentered data so as to minimize the quantization error of mapping this data to the vertices of a zero-centered binary hypercube. This method, dubbed iterative quantization (ITQ), has connections to multi-class spectral clustering and to the orthogonal Procrustes problem, and it can be used both with unsupervised data embeddings such as PCA and supervised embeddings such as canonical correlation analysis (CCA). Our experiments show that the resulting binary coding schemes decisively outperform several other state-of-the-art methods. 1
    corecore